- Home
- Search Results
- Page 1 of 1
Search for: All records
-
Total Resources4
- Resource Type
-
0003100000000000
- More
- Availability
-
31
- Author / Contributor
- Filter by Author / Creator
-
-
Jia, Zeyu (4)
-
Li, Gene (2)
-
Polyanskiy, Yury (2)
-
Rakhlin, Alexander (2)
-
Sekhari, Ayush (2)
-
Wu, Yihong (2)
-
Srebro, Nathan (1)
-
Srebro, Nati (1)
-
#Tyler Phillips, Kenneth E. (0)
-
#Willis, Ciara (0)
-
& Abreu-Ramos, E. D. (0)
-
& Abramson, C. I. (0)
-
& Abreu-Ramos, E. D. (0)
-
& Adams, S.G. (0)
-
& Ahmed, K. (0)
-
& Ahmed, Khadija. (0)
-
& Aina, D.K. Jr. (0)
-
& Akcil-Okan, O. (0)
-
& Akuom, D. (0)
-
& Aleven, V. (0)
-
- Filter by Editor
-
-
Globerson, A (1)
-
Hardt, M (1)
-
Levine, S (1)
-
Naumann, T (1)
-
Oh, A (1)
-
Saenko, K (1)
-
& Spizer, S. M. (0)
-
& . Spizer, S. (0)
-
& Ahn, J. (0)
-
& Bateiha, S. (0)
-
& Bosch, N. (0)
-
& Brennan K. (0)
-
& Brennan, K. (0)
-
& Chen, B. (0)
-
& Chen, Bodong (0)
-
& Drown, S. (0)
-
& Ferretti, F. (0)
-
& Higgins, A. (0)
-
& J. Peters (0)
-
& Kali, Y. (0)
-
-
Have feedback or suggestions for a way to improve these results?
!
Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Oh, A; Naumann, T; Globerson, A; Saenko, K; Hardt, M; Levine, S (Ed.)We study the problem of agnostic PAC reinforcement learning (RL): given a policy class Pi, how many rounds of interaction with an unknown MDP (with a potentially large state and action space) are required to learn an epsilon-suboptimal policy with respect to Pi? Towards that end, we introduce a new complexity measure, called the spanning capacity, that depends solely on the set Pi and is independent of the MDP dynamics. With a generative model, we show that the spanning capacity characterizes PAC learnability for every policy class Pi. However, for online RL, the situation is more subtle. We show there exists a policy class Pi with a bounded spanning capacity that requires a superpolynomial number of samples to learn. This reveals a surprising separation for agnostic learnability between generative access and online access models (as well as between deterministic/stochastic MDPs under online access). On the positive side, we identify an additional sunflower structure which in conjunction with bounded spanning capacity enables statistically efficient online RL via a new algorithm called POPLER, which takes inspiration from classical importance sampling methods as well as recent developments for reachable-state identification and policy evaluation in reward-free exploration.more » « lessFree, publicly-accessible full text available March 30, 2026
-
Jia, Zeyu; Li, Gene; Rakhlin, Alexander; Sekhari, Ayush; Srebro, Nati (, Advances in Neural Information Processing Systems 36)
-
Jia, Zeyu; Polyanskiy, Yury; Wu, Yihong (, COLT 2023)
-
Jia, Zeyu; Polyanskiy, Yury; Wu, Yihong (, Proceedings of Machine Learning Research)
An official website of the United States government

Full Text Available